Using Lip Reading Recognition to Predict Daily Mandarin Conversation
نویسندگان
چکیده
Audio-based automatic speech recognition as a hearing aid is susceptible to background noise and overlapping speeches. Consequently, audio-visual has been developed complement the audio input with additional visual information. However, huge improvement of neural networks in task resulted robust reliable lip reading framework that can recognize from alone. In this work, we propose model predict daily Mandarin conversation collect new Daily Conversation Lip Reading (DMCLR) dataset, consisting 1,000 videos 100 conversations spoken by ten speakers. Our consists spatiotemporal convolution layer, SE-ResNet-18 network, back-end module bi-directional gated recurrent unit (Bi-GRU), 1D convolution, fully-connected layers. This able reach 94.2% accuracy DMCLR dataset. Such performance makes it possible for applications be practical real life. Additionally, are achieve 86.6% 57.2% on Wild (LRW) LRW-1000 (Mandarin), respectively. The results show our method achieves state-of-the-art these two challenging datasets.
منابع مشابه
Lip-Reading using Neural Networks
Lip-Reading has been practiced over centuries for teaching deaf and dumb to speak and communicate effectively with the other people. In this study, the use of neural networks in lip reading is explored. We convert the video of the subject speaking different words into images and then images are further selected manually for processing. As per the research the horizontal and the vertical distanc...
متن کاملMultifactor adaptation for Mandarin broadcast news and conversation speech recognition
We explore the integration of multiple factors such as genre and speaker gender for acoustic model adaptation tasks to improve Mandarin ASR system performance on broadcast news and broadcast conversation audio. We investigate the use of multifactor clustering of acoustic model training data and the application of MPE-MAP and fMPE-MAP acoustic model adaptations. We found that by effectively comb...
متن کاملLip-Reading using Neural Networks
Student of Computer Science Engineering, Lingaya’s Institute of Mgt. & Tech., Faridabad, 121002 India †† Student of Computer Science Engineering, Lingaya’s Institute of Mgt. & Tech., Faridabad, 121002 India ††† Student of Computer Science Engineering, Lingaya’s Institute of Mgt. & Tech., Faridabad, 121002 India Faculty of Computer Science Engineering, Lingaya’s University, Faridabad, 121002 Ind...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملUsing reading behavior to predict grammatical functions
This paper investigates to what extent grammatical functions of a word can be predicted from gaze features obtained using eye-tracking. A recent study showed that reading behavior can be used to predict coarse-grained part of speech, but we go beyond this, and show that gaze features can also be used to make more finegrained distinctions between grammatical functions, e.g., subjects and objects...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3175867